DiSCO: Distributed Optimization for Self-Concordant Empirical Loss
نویسندگان
چکیده
We propose a new distributed algorithm for empirical risk minimization in machine learning. The algorithm is based on an inexact damped Newton method, where the inexact Newton steps are computed by a distributed preconditioned conjugate gradient method. We analyze its iteration complexity and communication efficiency for minimizing self-concordant empirical loss functions, and discuss the results for distributed ridge regression, logistic regression and binary classification with a smoothed hinge loss. In a standard setting for supervised learning, where the n data points are i.i.d. sampled and when the regularization parameter scales as 1/ √ n, we show that the proposed algorithm is communication efficient: the required round of communication does not increase with the sample size n, and only grows slowly with the number of machines.
منابع مشابه
Distributed Inexact Damped Newton Method: Data Partitioning and Load-Balancing
In this paper we study inexact dumped Newton method implemented in a distributed environment. We start with an original DiSCO algorithm [Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss, Yuchen Zhang and Lin Xiao, 2015]. We will show that this algorithm may not scale well and propose an algorithmic modifications which will lead to less communications, better lo...
متن کاملCommunication-Efficient Distributed Optimization of Self-Concordant Empirical Loss
We consider distributed convex optimization problems originated from sample average approximation of stochastic optimization, or empirical risk minimization in machine learning. We assume that each machine in the distributed computing system has access to a local empirical loss function, constructed with i.i.d. data sampled from a common distribution. We propose a communication-efficient distri...
متن کاملPartitioning Data on Features or Samples in Communication-Efficient Distributed Optimization?
In this paper we study the effect of the way that the data is partitioned in distributed optimization. The original DiSCO algorithm [Communication-Efficient Distributed Optimization of SelfConcordant Empirical Loss, Yuchen Zhang and Lin Xiao, 2015] partitions the input data based on samples. We describe how the original algorithm has to be modified to allow partitioning on features and show its...
متن کاملSelf-concordant analysis for logistic regression
Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techni...
متن کاملAn interior-point Lagrangian decomposition method for separable convex optimization
In this paper we propose a distributed algorithm for solving large-scale separable convex problems using Lagrangian dual decomposition and the interior-point framework. By adding self-concordant barrier terms to the ordinary Lagrangian we prove under mild assumptions that the corresponding family of augmented dual functions is self-concordant. This makes it possible to efficiently use the Newto...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015